Automatic Detection of Intra-Word Code-Switching
نویسندگان
چکیده
Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
منابع مشابه
Pattern Matching Refinements to Dictionary-Based Code-Switching Point Detection
This study presents the development and evaluation of pattern matching refinements (PMRs) to automatic code switching point (CSP) detection. With all PMRs, evaluation showed an accuracy of 94.51%. This is an improvement to reported accuracy rates of dictionary-based approaches, which are in the range of 75.22%-76.26% (Yeong and Tan, 2010). In our experiments, a 100sentence Tagalog-English corpu...
متن کاملThe Effect of Intra-sentential, Inter-sentential and Tag- sentential Switching on Teaching Grammar
The present study examined the comparative effect of different types of code-switching, i.e., intrasentential,inter-sentential, and tag-sentential switching on EFL learners grammar learning andteaching. To this end, a sample of 60 Iranian female and male students in two different institutionsin Qazvin was selected. They were assigned to four groups. Each group was randomly assigned toone of the...
متن کاملAddressing Code-Switching in French/Algerian Arabic Speech
This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...
متن کاملSpeech Recognition on English-Mandarin Code-Switching Data using Factored Language Models - with Part-of-Speech Tags, Language ID and Code-Switch Point Probability as Factors pdfsubject=Multilingual Speech Recognition
Code-switching is defined as ”the alternate use of two or more languages in the same utterance or conversation” [1]. CS is a wide-spread phenomenon in multilingual communities, where multiple languages are concurrently used in a conversation. For automatic speech recognition (ASR), particularly intra-sentential code-switching poses an interesting challenge due to the multilingual context for la...
متن کاملMixed Language and Code-Switching in the Canadian Hansard
While there has been lots of interest in code-switching in informal text such as tweets and online content, we ask whether code-switching occurs in the proceedings of multilingual institutions. We focus on the Canadian Hansard, and automatically detect mixed language segments based on simple corpus-based rules and an existing word-level language tagger. Manual evaluation shows that the performa...
متن کامل